AITopics | Lab Test

Collaborating Authors

Lab Test

Evaluating Prompting Strategies with MedGemma for Medical Order Extraction

Balachandran, Abhinand, Durgapraveen, Bavana, Sudhagar, Gowsikkan Sikkan, S, Vidhya Varshany J, Rajkumar, Sriram

arXiv.org Artificial IntelligenceNov-14-2025

The accurate extraction of medical orders from doctor-patient conversations is a critical task for reducing clinical documentation burdens and ensuring patient safety. This paper details our team submission to the MEDIQA-OE-2025 Shared Task. We investigate the performance of MedGemma, a new domain-specific open-source language model, for structured order extraction. We systematically evaluate three distinct prompting paradigms: a straightforward one-Shot approach, a reasoning-focused ReAct framework, and a multi-step agentic workflow. Our experiments reveal that while more complex frameworks like ReAct and agentic flows are powerful, the simpler one-shot prompting method achieved the highest performance on the official validation set. We posit that on manually annotated transcripts, complex reasoning chains can lead to "overthinking" and introduce noise, making a direct approach more robust and efficient. Our work provides valuable insights into selecting appropriate prompting strategies for clinical information extraction in varied data conditions.

extraction, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.10583

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.70)
Health & Medicine > Diagnostic Medicine > Lab Test (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.35)

Add feedback

MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM

Li, Wenliang, Yan, Rui, Zhang, Xu, Chen, Li, Zhu, Hongji, Zhao, Jing, Li, Junjun, Li, Mengru, Cao, Wei, Jiang, Zihang, Wei, Wei, Zhang, Kun, Zhou, Shaohua Kevin

arXiv.org Artificial IntelligenceSep-29-2025

Large language models (LLMs) have demonstrated notable potential in medical applications, yet they face substantial challenges in handling complex real-world clinical diagnoses using conventional prompting methods. Current prompt engineering and multi-agent approaches typically optimize isolated inferences, neglecting the accumulation of reusable clinical experience. To address this, this study proposes a novel Multi-Agent Clinical Diagnosis (MACD) framework, which allows LLMs to self-learn clinical knowledge via a multi-agent pipeline that summarizes, refines, and applies diagnostic insights. It mirrors how physicians develop expertise through experience, enabling more focused and accurate diagnosis on key disease-specific cues. We further extend it to a MACD-human collaborative workflow, where multiple LLM-based diagnostician agents engage in iterative consultations, supported by an evaluator agent and human oversight for cases where agreement is not reached. Evaluated on 4,390 real-world patient cases across seven diseases using diverse open-source LLMs (Llama-3.1 8B/70B, DeepSeek-R1-Distill-Llama 70B), MACD significantly improves primary diagnostic accuracy, outperforming established clinical guidelines with gains up to 22.3% (MACD). In direct comparison with physician-only diagnosis under the same evaluation protocol, MACD achieves comparable or superior performance, with improvements up to 16%. Furthermore, the MACD-human workflow yields an 18.6% improvement over physician-only diagnosis, demonstrating the synergistic potential of human-AI collaboration. Notably, the self-learned clinical knowledge exhibits strong cross-model stability, transferability across LLMs, and capacity for model-specific personalization.This work thus presents a scalable self-learning paradigm that bridges the gap between the intrinsic knowledge of LLMs.

large language model, machine learning, self-learned knowledge, (18 more...)

arXiv.org Artificial Intelligence

2509.20067

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Nephrology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluation of Causal Reasoning for Large Language Models in Contextualized Clinical Scenarios of Laboratory Test Interpretation

Bhasuran, Balu, Prosperi, Mattia, Hanna, Karim, Petrilli, John, Washington, Caretia JeLayne, He, Zhe

arXiv.org Artificial IntelligenceSep-23-2025

This study evaluates causal reasoning in large language models (LLMs) using 99 clinically grounded laboratory test scenarios aligned with Pearl's Ladder of Causation: association, intervention, and counterfactual reasoning. We examined common laboratory tests such as hemoglobin A1c, creatinine, and vitamin D, and paired them with relevant causal factors including age, gender, obesity, and smoking. Two LLMs - GPT-o1 and Llama-3.2-8b-instruct - were tested, with responses evaluated by four medically trained human experts. GPT-o1 demonstrated stronger discriminative performance (AUROC overall = 0.80 +/- 0.12) compared to Llama-3.2-8b-instruct (0.73 +/- 0.15), with higher scores across association (0.75 vs 0.72), intervention (0.84 vs 0.70), and counterfactual reasoning (0.84 vs 0.69). Sensitivity (0.90 vs 0.84) and specificity (0.93 vs 0.80) were also greater for GPT-o1, with reasoning ratings showing similar trends. Both models performed best on intervention questions and worst on counterfactuals, particularly in altered outcome scenarios. These findings suggest GPT-o1 provides more consistent causal reasoning, but refinement is required before adoption in high-stakes clinical applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.16372

Country: North America > United States > Florida (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Diagnostic Medicine > Lab Test (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ExOSITO: Explainable Off-Policy Learning with Side Information for Intensive Care Unit Blood Test Orders

Ji, Zongliang, Amaral, Andre Carlos Kajdacsy-Balla, Goldenberg, Anna, Krishnan, Rahul G.

arXiv.org Artificial IntelligenceApr-25-2025

Ordering a minimal subset of lab tests for patients in the intensive care unit (ICU) can be challenging. Care teams must balance between ensuring the availability of the right information and reducing the clinical burden and costs associated with each lab test order. Most in-patient settings experience frequent over-ordering of lab tests, but are now aiming to reduce this burden on both hospital resources and the environment. This paper develops a novel method that combines off-policy learning with privileged information to identify the optimal set of ICU lab tests to order. Our approach, EXplainable Off-policy learning with Side Information for ICU blood Test Orders (ExOSITO) creates an interpretable assistive tool for clinicians to order lab tests by considering both the observed and predicted future status of each patient. We pose this problem as a causal bandit trained using offline data and a reward function derived from clinically-approved rules; we introduce a novel learning framework that integrates clinical knowledge with observational data to bridge the gap between the optimal and logging policies. The learned policy function provides interpretable clinical information and reduces costs without omitting any vital lab orders, outperforming both a physician's policy and prior approaches to this practical problem.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2504.17277

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Diagnostic Medicine > Lab Test (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Xu, Shaochen, Zhou, Yifan, Liu, Zhengliang, Wu, Zihao, Zhong, Tianyang, Zhao, Huaqin, Li, Yiwei, Jiang, Hanqi, Pan, Yi, Chen, Junhao, Lu, Jin, Zhang, Wei, Zhang, Tuo, Zhang, Lu, Zhu, Dajiang, Li, Xiang, Liu, Wei, Li, Quanzheng, Sikora, Andrea, Zhai, Xiaoming, Xiang, Zhen, Liu, Tianming

arXiv.org Artificial IntelligenceNov-16-2024

Artificial Intelligence (AI) has become essential in modern healthcare, with large language models (LLMs) offering promising advances in clinical decision-making. Traditional model-based approaches, including those leveraging in-context demonstrations and those with specialized medical fine-tuning, have demonstrated strong performance in medical language processing but struggle with real-time adaptability, multi-step reasoning, and handling complex medical tasks. Agent-based AI systems address these limitations by incorporating reasoning traces, tool selection based on context, knowledge retrieval, and both short- and long-term memory. These additional features enable the medical AI agent to handle complex medical scenarios where decision-making should be built on real-time interaction with the environment. Therefore, unlike conventional model-based approaches that treat medical queries as isolated questions, medical AI agents approach them as complex tasks and behave more like human doctors. In this paper, we study the choice of the backbone LLM for medical AI agents, which is the foundation for the agent's overall reasoning and action generation. In particular, we consider the emergent o1 model and examine its impact on agents' reasoning, tool-use adaptability, and real-time information retrieval across diverse clinical scenarios, including high-stakes settings such as intensive care units (ICUs). Our findings demonstrate o1's ability to enhance diagnostic accuracy and consistency, paving the way for smarter, more responsive AI tools that support better patient outcomes and decision-making efficacy in clinical practice.

large language model, machine learning, staphylococcus aureus, (18 more...)

arXiv.org Artificial Intelligence

2411.14461

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Texas > Tarrant County > Arlington (0.14)
North America > United States > Colorado > Adams County > Aurora (0.14)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Nephrology (1.00)
(17 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lab-AI -- Retrieval-Augmented Language Model for Personalized Lab Test Interpretation in Clinical Medicine

Wang, Xiaoyu, Ouyang, Haoyong, Bhasuran, Balu, Luo, Xiao, Hanna, Karim, Lustria, Mia Liza A., He, Zhe

arXiv.org Artificial IntelligenceSep-16-2024

Accurate interpretation of lab results is crucial in clinical medicine, yet most patient portals use universal normal ranges, ignoring factors like age and gender. This study introduces Lab-AI, an interactive system that offers personalized normal ranges using Retrieval-Augmented Generation (RAG) from credible health sources. Lab-AI has two modules: factor retrieval and normal range retrieval. We tested these on 68 lab tests--30 with conditional factors and 38 without. For tests with factors, normal ranges depend on patient-specific information. Our results show GPT-4-turbo with RAG achieved a 0.95 F1 score for factor retrieval and 0.993 accuracy for normal range retrieval. GPT-4-turbo with RAG outperformed the best non-RAG system by 29.1% in factor retrieval and showed 60.9% and 52.9% improvements in question-level and lab-level performance, respectively, for normal range retrieval. These findings highlight Lab-AI's potential to enhance patient understanding of lab results. Introduction The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 played a key role in promoting the adoption and meaningful use of electronic health records (EHRs) throughout the U.S. healthcare system. Through the Medicare and Medicaid EHR Incentive Programs, the Act provided financial incentives that facilitated widespread EHR adoption.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.18986

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
North America > United States > Oklahoma > Payne County > Stillwater (0.04)
North America > United States > Georgia > Fulton County > Johns Creek (0.04)
North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Blood test for male infertility could be on the horizon: AI can screen men with 74% accuracy - with no semen needed

Daily Mail - Science & techJul-31-2024, 09:08:20 GMT

Although the terms are often confused or used interchangeably, sperm and semen are not the same thing. Semen is the fluid that comes out of the penis, while sperm are the microscopic cells within the semen. Sperm cells are specialized for the task of fertilizing an egg. Semen analysis is considered essential for diagnosis of male infertility, but is not readily available at medical institutions other than those specializing in infertility treatment. 'Fertility specialists take it for granted that the first step in diagnosing male infertility is to perform a semen analysis,' Professor Kobayashi added.

infertility, male infertility, semen, (15 more...)

Daily Mail - Science & tech

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Industry: Health & Medicine > Diagnostic Medicine > Lab Test (0.85)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Sun, Liwen, Agarwal, Abhineet, Kornblith, Aaron, Yu, Bin, Xiong, Chenyan

arXiv.org Artificial IntelligenceMay-27-2024

In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In collaboration with ED clinicians, we use public patient data to curate MIMIC-ED-Assist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. With MIMIC-ED-Assist, we develop ED-Copilot which sequentially suggests patient-specific laboratory tests and makes diagnostic predictions. ED-Copilot employs a pre-trained bio-medical language model to encode patient information and uses reinforcement learning to minimize ED wait time and maximize prediction accuracy. On MIMIC-ED-Assist, ED-Copilot improves prediction accuracy over baselines while halving average wait time from four hours to two hours. ED-Copilot can also effectively personalize treatment recommendations based on patient severity, further highlighting its potential as a diagnostic assistant. Since MIMIC-ED-Assist is a retrospective benchmark, ED-Copilot is restricted to recommend only observed tests. We show ED-Copilot achieves competitive performance without this restriction as the maximum allowed time increases. Our code is available at https://github.com/cxcscmu/ED-Copilot.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.13448

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Lab Test (0.76)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Enabling Patient-side Disease Prediction via the Integration of Patient Narratives

Su, Zhixiang, Zhang, Yinan, Jing, Jiazheng, Xiao, Jie, Shen, Zhiqi

arXiv.org Artificial IntelligenceMay-5-2024

Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a complex task from the standpoint of a patient and is always only available post-patient consultation. To make disease prediction available from patient-side, we propose Personalized Medical Disease Prediction (PoMP), which predicts diseases using patient health narratives including textual descriptions and demographic information. By applying PoMP, patients can gain a clearer comprehension of their conditions, empowering them to directly seek appropriate medical specialists and thereby reducing the time spent navigating healthcare communication to locate suitable doctors. We conducted extensive experiments using real-world data from Haodf to showcase the effectiveness of PoMP.

category, disease prediction, prediction, (12 more...)

arXiv.org Artificial Intelligence

2405.02935

Country:

Asia > Singapore > Central Region > Singapore (0.06)
Asia > China (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.66)
Health & Medicine > Diagnostic Medicine > Lab Test (0.54)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Ding, Jun-En, Thao, Phan Nguyen Minh, Peng, Wen-Chih, Wang, Jian-Zhe, Chug, Chun-Cheng, Hsieh, Min-Chen, Tseng, Yun-Chien, Chen, Ling, Luo, Dongsheng, Wang, Chi-Te, Chen, Pei-fu, Liu, Feng, Hung, Fang-Ming

arXiv.org Artificial IntelligenceMar-2-2024

Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction.

laboratory test value, language model, prediction, (15 more...)

arXiv.org Artificial Intelligence

2403.04785

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Diagnostic Medicine > Lab Test (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback